Range Trees
Motivation
Given a set of $n$ points $P$ on a real line and a query interval $[x:x’]$ find all the points inside the interval.
Construct a binary search tree from $P$ and perform the following query:
Figure: Example of searching in range tree
- Search for $x$ and $x’$ until we get to $v_{\rm split}$ where the search path splits.
- From the left child of $v_{\rm split}$ we continue search with $x$ and at every node $v$ we where search path goes left we all points in right subtree of $v$.
- Symmetrically we go right from $v_{\rm split}$ searching for $x’$ and taking left subtrees of $v$ respectively.
Definition
Canonical Subset of node $v$ (i.e. $P(v)$): subset of points stored in the leaves of the subtree at $v$.
2-D Range Trees
- The main tree is a balanced binary search tree built $T$ built on the x-coordinates of $P$.
- For any internal node / leaf node $v$ in $T$, the canonical subset $P(v)$ is stored in a balanced binary search tree $T_1(v)$ on the y-coordinates of the points. The node $v$ contains a pointer to the root of $T_1(v)$.
Creation $O(n\log n)$
Note Use presorted $P$
- Create $T_1(v)$ binary search tree.
- If $P$ has only one point then create leaf else split $P$ into two sets $P_{\rm left}$ and $P_{\rm right}$ using $x_{\rm mid}$ median point. Recursively create $v_{\rm left}$ and $v_{\rm right}$ from $P_{\rm left}$ and $P_{\rm right}$ respectively. Create a node with $x_{\rm mid}$ and $v_{\rm left}$ and $v_{\rm right}$ left and right children of this node. Make $T_1(v)$ the associated structure of $v$.
Querying $O(\log ^2n +k)$
- Find $v_{\rm split}$
- If $v_{\rm split}$ is a leaf check point inside it and report if necessary.
- Else
- Follow path to $x$ and perform 1-D range query on the subtrees right of the path. Also check if point stored at the final leaf node $v$ must be reported.
- Similary do for $x’$ and perform 1-D range query on the subtrees left of the path from $v_{\rm split}$. Again check at the end for the point stored at $v$.
Fractional Cascading
If two sets of objects $S_1$ and $S_2$ are stored int sorted arrays $A_1$ and $A_2$. To find keys in $[y:y’]$
- We can binary search for ceil of $y$ in $A_1$ and then walk along the array until the floor of $y’$. Similary for $S_2$. Total time will be $O(k)$ plus two binary searches ($k$ reported objects).
- If $S_2\subseteq S_1$ we can maintain pointers from $A_1$ to $A_2$, i.e. we store the pointer to ceil key for every value in $A_1$. This will avoid the second binary search.
Figure: Layered Range Tree showing only x-coordinates. Full points are given below
Similarly $P(lc(v))\subseteq P(v)$ and $P(rc(v))\subseteq P(v)$. Instead of associated binary trees we will store an array sorted on the y-coordinates. Each value in the array will additionaly store two pointers to $A(lc(v))$ and $A(rc(v))$. This becomes the layered range tree. This makes the query time $O(\log n + k)$.
Figure: The associated arrays with the nodes.
While querying for $[x:x’]\times[y:y’]$:
- We search for $x$, $x’$ and $v_{\rm split}$. At $A(v_{\rm split})$ we we find the ceil entry of $y$.
- While searching in $x$ and $x’$ in main tree we keep track of ceil entry of $y$ by following pointers in constant time.